perm filename DBL6.TEX[AM,DBL]1 blob sn#543022 filedate 1980-11-19 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00013 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	\init{
C00004 00003	\chapbegin{6}		% Here beginneth Chapter 6
C00006 00004	\sectionbegin[1]{What AM Did}
C00012 00005	\subsectionbegin[1.1]{Linear Task-by-Task Summary of a Good Run}
C00046 00006	\subsectionbegin[1.2]{Two-Dimensional Behavior Graph}
C00053 00007	\subsectionbegin[1.3]{AM as a Computer Program}
C00060 00008	\sectionbegin[2]{Experiments with AM}
C00070 00009	\subsectionbegin[2.1]{Must the Worth numbers be finely tuned?}
C00081 00010	\subsectionbegin[2.2]{How Finely Tuned is the Agenda?}
C00089 00011	\subsectionbegin[2.3]{How Valuable is Tacking Reasons onto Each Task?}
C00094 00012	\subsectionbegin[2.4]{What if Certain Concepts are Eliminated or Added?}
C00100 00013	\subsectionbegin[2.5]{Can {\AM} Work in a New Domain: Plane Geometry?}\par
C00108 ENDMK
C⊗;
\init{
\input cmpdes
\input ammacs
\def\draft{T}
\titlepage
\newpage
\setcount0 103
\parttitle{AM: DISCOVERY IN MATHEMATICS AS HEURISTIC SEARCH}
}
\baselineskip 12pt
\chapbegin{6}		% Here beginneth Chapter 6
\chaptertitle{RESULTS}
\rjustline{\:B Results}
\runningrighthead{INTRODUCTION}
\tenpoint
\vskip 14pc

\epigraph{Now we have seen that mathematical work is not simply mechanical, that it
could not be done by a machine, however perfect. It is not merely a question
of applying rules, of making the  most combinations possible according to certain
fixed laws. The combinations so obtained would be exceedingly numerous, useless,
and cumbersome. The true work of the inventor consists in choosing among these combinations
so as to eliminate the useless ones or rather to avoid the trouble of making them,
and the rules which must guide this choice are extremely fine and delicate. It is
almost impossible to state them precisely; they are felt rather than formulated.
Under these conditions, how imagine a sieve capable of applying them 
mechanically?}
\author{Poincar\'e}
\epigraphend

\sectionbegin[1]{What AM Did}
{\AM} is both a mathematician of sorts, and a big computer program.

\listbegin

\bullistentry{By granting  
{\AM} more anthropomorphic  qualities than it  deserves, we
can  describe  its  progress  through  elementary  mathematics.    It
rediscovered many  well-known  concepts,  a  couple  interesting  but
not-generally-known ones,  and several  concepts which  were hitherto
unknown    and   should    have   stayed    that   way.       Section
1--5 recapped what {\AM} did, much
as  a  historian might  critically  evaluate Euler's  work.   
Instead of repeating  any of this descriptive prose here, we now
provide a
very brief listing of what {\AM} did in a single good run, task by task.
These task-by-task listings are not 
complete listings of every task
{\AM} ever attempted in any of its many runs, 
but rather a trace of a single, better-than-average run of
the \ffootnote{program.}{In fact,  
it is perhaps the best overall run. It occurred in two stages (due to
space problems; unimportant). In this particular run, {\AM} 
misses the few ``very best''  discoveries it ever made,
since the runs they occurred in went in somewhat different directions. It
also omits some of the more boring tasks: see, \eg,  the description
of task number 33. } The reader may wish to consult the brief
alphabetized glossary of concept names in the previous chapter.
Following this linear trace of {\AM}'s behavior is a more appropriate representation
of what it did: namely, a two-dimensional graph of that
same behavior as seen in ``concept-space.''}

\bullistentry{By under-estimating  
{\AM}'s sophistication,  one can demand  answers to
the typical questions to ask about a computer program: how big is it,
how much cpu time does it use, what language it's coded in, etc.  These
are   
found in   the third subsection (6--1.3).}

\listend
\subsectionbegin[1.1]{Linear Task-by-Task Summary of a Good Run}

\yskip

{\ninepoint \bf \parindent 0pt \parskip 1pt plus 1pt minus 0pt \setcount8 0 


\hangindent 30pt \hh   Fill in examples of Compose. Failed, but suggested next task:

\hangindent 30pt \hh   Fill in examples of Set-union. Also failed, but suggested:

\hangindent 30pt \hh   Fill  in  examples of  Sets. Many  found  (\eg, by  instantiating
Set.Defn) and then more derived from those examples (\eg, by running
Union.Alg).

\hangindent 30pt \hh   Fill in specializations of Sets (because it was very easy  to find
examples  of  Sets). Creation  of  new  concepts. One,  INT-Sets,  is
related to  ``Singletons.''  Another, ``BI-Sets,'' is all nests of braces
(no atomic elements).

\hangindent 30pt \hh   Fill in examples of INT-Sets. This indirectly led to a rise in the
worth of Equal.

\hangindent 30pt \hh   Check all examples of INT-Sets. All were confirmed.
{\AM} defines the set of Nonempty INT-Sets; this is renamed ``Singletons'' by the user.

\hangindent 30pt \hh   Check  all examples of  Sets.  To  check a couple  conjectures, {\AM}
will soon look for Bags and Osets.

\hangindent 30pt \hh   Fill in examples of Bags.

\hangindent 30pt \hh   Fill in specializations of Bags. Created INT-Bags (contain just one kind of
element), and BI-Bags (nests of parentheses).

\hangindent 30pt \hh   Fill in examples of Osets.

\hangindent 30pt \hh   Check examples of Osets.

\hangindent 30pt \hh   Fill in examples of Lists.

\hangindent 30pt \hh   Check examples of Lists.

\hangindent 30pt \hh   Fill in examples of All-but-first.

\hangindent 30pt \hh   Fill in examples of All-but-last.

\hangindent 30pt \hh   Fill in specializations of All-but-last. Failed.

\hangindent 30pt \hh   Fill in examples of List-union.

\hangindent 30pt \hh   Fill in examples of Proj1.

\hangindent 30pt \hh   Check examples of All-but-first.

\hangindent 30pt \hh   Check examples of All-but-last.

\hangindent 30pt \hh   Fill in examples of Proj2.

\hangindent 30pt \hh   Fill in examples of Empty-structures. 4 found.

\hangindent 30pt \hh   Fill in generalizations of Empty-structures. Failed.

\hangindent 30pt \hh   Check examples of List-union.

\hangindent 30pt \hh   Check examples of Bags. Defined Singleton-bags.

\hangindent 30pt \hh   Fill in examples of Bag-union.

\hangindent 30pt \hh   Check examples of Proj2.

\hangindent 30pt \hh   Fill in examples of Set-union.

\hangindent 30pt \hh   Check examples of Set-union. Define 
$\lambda (x,y) \ x\union y=x$, later called Superset.

\hangindent 30pt \hh   Fill in examples of Bag-insert.

\hangindent 30pt \hh   Check examples of Bag-insert. Range is really Nonempty bags. Isolate the results
of insertion restricted to Singletons: call them Doubleton-bags.

\hangindent 30pt \hh   Fill in examples of Bag-intersect.

\hangindent 30pt \hh   Fill in examples of Set-insert.

\hangindent 30pt \hh   Check examples of Set-insert. Range is always Nonempty sets.
Define $\lambda $ (x,S) Set-insert(x,S)=S; \ie, set membership.
Define Doubleton sets.

\hangindent 30pt \hh   Fill in examples of Bag-delete.

\hangindent 30pt \hh   Fill in examples of Bag-difference.

\hangindent 30pt \hh   Check examples of Bag-intersect. 
Define $\lambda (x,y) \ x\inter y=()$; \ie disjoint bags.

\hangindent 30pt \hh   Fill in examples of Set-intersect. 

\hangindent 30pt \hh   Check examples of Set-intersect. 
Define $\lambda (x,y) \ x\inter y=x$; \ie, subset.
Also define disjoint sets: $\lambda  (x,y)\ x\inter y=\{\}$.

\hangindent 30pt \hh   Fill in examples of List-intersect.

\hangindent 30pt \hh   Fill in examples  of Equal. Very difficult to  find examples; this
led to:

\hangindent 30pt \hh   Fill  in generalizations of Equal.  Define
``Same-size,'' ``Equal-CARs,'' and some losers.

\hangindent 30pt \hh   Fill in examples of Same-size.

\hangindent 30pt \hh   Apply an Algorithm for  Canonize to the args Same-size and  Equal.
{\AM} eventually synthesizes the canonizing function ``Size.''  {\AM} defines
the set of canonical structures: bags of T's; this later gets renamed
as ``Numbers.''

\hangindent 30pt \hh   Restrict  the  Domain/range  of Bag-union.    A new  operation  is
defined,  Number-union,  with Domain/range  entry  <Number Number  $\→$
Bag>.

\hangindent 30pt \hh   Fill in examples of Number-union.  Many found. 

\hangindent 30pt \hh   Check  the Domain/range of  Number-union.   Range is `Number'.
This operation is renamed ``Add2.''

\hangindent 30pt \hh   Restrict the Domain/range of  Bag-intersect to Numbers. Renamed
``Minimum.''

\hangindent 30pt \hh   Restrict  the Domain/range  of Bag-delete  to Numbers.   Renamed
``SUB1.''

\hangindent 30pt \hh    Restrict  the  Domain/range  of  Bag-insert  to  Numbers.     
{\AM} calls the new operation ``Number-insert.'' Its
Domain/range entry is <Anything Number $\→$ Bag>.

\hangindent 30pt \hh    Check  the  Domain/range  of  Number-insert.  This  doesn't  lead
anywhere.

\hangindent 30pt \hh    Restrict  the  Domain/range of  Bag-difference  to  Numbers. This
becomes ``Subtract.''

\eject
\hangindent 30pt \hh   Fill  in examples  of Subtract.  
This leads to defining the relation LEQ ($≤$\ffootnote{).}{If  a
larger number is ``subtracted'' from a smaller, the result is zero.  {\AM}
explicitly  defines the set  of ordered pairs of  numbers having zero
``difference.'' <x,y> is in that set iff x is less than or equal to y. }

\hangindent 30pt \hh   Fill in examples of LEQ. Many found.

\hangindent 30pt \hh   Check examples of LEQ. 

\hangindent 30pt \hh   Apply algorithm of Coalesce to LEQ.  LEQ(x,x) is Constant-True.

\hangindent 30pt \hh      Fill    in     examples    of     Parallel-join2.    Included     is
Parallel-join2(Bags,Bags,Proj2),  which is renamed ``TIMES,''
and
Parallel-join2(Structures,Structures,Proj1), a generalized
Union operation renamed ``G-Union,'' and a bunch of losers.



\hangindent 30pt \hh--{\669}.  Fill  in  
and check  examples  of  the  operations just
created.

\setcount8 69

\hangindent 30pt \hh   Fill in examples of Coalesce.  Created: Self-Compose, Self-Insert,
Self-Delete, Self-Add,  Self-Times, Self-Union, etc.   Also:
Coa-repeat2, Coa-join2, etc.

\hangindent 30pt \hh   Fill in examples of Self-Delete. Many found.

\hangindent 30pt \hh    Check examples  of  Self-Delete.  Self-Delete is just Identity-op.

\hangindent 30pt \hh   Fill in examples of Self-member. No positive examples found.

\hangindent 30pt \hh    Check examples  of  Self-member.  Self-member is just Constant-False.

\hangindent 30pt \hh   Fill in examples of Self-Add. Many found. User renames this ``Doubling.''

\hangindent 30pt \hh   Check examples of Coalesce. Confirmed.

\hangindent 30pt \hh   Check examples of Add2. Confirmed.

\hangindent 30pt \hh   Fill in examples of Self-Times. 
Many found. 
Renamed ``Squaring'' by the user.

\hangindent 30pt \hh    Fill  in  examples   of  Self-Compose. 
Defined Squaring$\circ$Squaring.
Created Add$\circ$Add (two versions: Add2$↓1$ which is
$\lambda  (x,y,z)\ (x+y)+z$,  and Add2$↓2$  which  is $x+(y+z)$).
Similarly, two
versions of Times$\circ$Times and of
Compose$\circ$Compose. 

\hangindent 30pt \hh   Fill in examples of Add2$↓1$ (\ie, $(x+y)+z$). 
Many are found.

\hangindent 30pt \hh   Fill in examples of Add2$↓2$ (i.e, $x+(y+z)$). Again many are found.

\hangindent 30pt \hh   Check examples of Squaring. Confirmed. 

\hangindent 30pt \hh   Check examples of Add2$↓2$.   Add2$↓1$ and Add2$↓2$ appear  equivalent. But
first:

\hangindent 30pt \hh   Check examples of Add2$↓1$.  Add2$↓1$ and Add2$↓2$ still appear equivalent.
Merge them.  So the proper argument for a generalized ``Add'' operation
is a Bag.

\hangindent 30pt \hh   Apply algorithm for Invert to argument `Add'. 
Define Inv-add(x) as
the set of all bags of numbers (>0) whose sum is x. Also denoted Add\inv(x).

\hangindent 30pt \hh   Fill in examples of TIMES2$↓1$. (xy)z. Many are found.

\hangindent 30pt \hh   Fill in examples of TIMES2$↓2$. x(yz). Again many are found.

\hangindent 30pt \hh   Check examples of TIMES2$↓2$.  
TIMES2$↓1$ and TIMES2$↓2$ may be equivalent.

\hangindent 30pt \hh   Check  examples of  TIMES2$↓1$.   TIMES2$↓1$  and TIMES2$↓2$  still  appear
equivalent. Merge  them.   So the proper  argument for  a generalized
``TIMES'' operation  is a Bag. Set up an analogy between TIMES and ADD,
because  of  this  fact.   

\hangindent 30pt \hh     Apply  algorithm   for  Invert   to  argument   `TIMES'.  Define
Inv-TIMES(x) as the set of all bags  of numbers (>1) whose product is x.  
Also denoted TIMES\inv.  Analogic to Inv-Add.

\hangindent 30pt \hh     Fill    in   examples   of    Parallel-replace2.       Included   are
Parallel-replace2(Bags,Bags,Proj2) (called MR2-BBP2), and many losers.


\hangindent 30pt \hh--{\6107. }  Fill  in  and 
check  examples  of  the operations  just
created.

\setcount8 107 

\hangindent 30pt \hh   Fill  in examples  of Compose.   
So easy that {\AM} creates Int-Compose.

\hangindent 30pt \hh   Fill in examples  of Int-Compose.  
The  two chosen operations  G,H
must be  such that range(H) is a component (subset) of domain(G),  
and range(G) is a component of domain(H);  both G
and  H must  be interesting.
Create G-Union$\circ$\ffootnote{MR2-BBP2,}{An alternate
derivation of  the operation of  multiplication. }
Insert$\circ$Delete, Times$\circ$Squaring, etc.


\hangindent 30pt \hh--{\6127. } Fill  in and  check examples  of  the compositions  just
created.   Notice that  G-Union$\circ$MR2-BBP2 is just
TIMES.

\setcount8 127 

\hangindent 30pt \hh      Fill    in     examples    of    Coa-repeat2.    Among    them:
Coa-repeat2(Bags-of-Numbers, Add2)     [{\sl multiplication again!}],
Coa-repeat2(Bags-of-Numbers, Times) [{\sl exponen\-ti\-a\-tion}],  
Coa-repeat2(Structures, Proj1) [CAR],
Coa-repeat2(Structures, Proj\-2) [{\sl Last-element-of}], etc.

\hangindent 30pt \hh   Check the examples of Coa-repeat2. All confirmed.

\hangindent 30pt \hh   Apply algorithms for Invert to `Doubling'.
The result is called ``Halving'' by the user.
{\AM} then defines ``Evens.''

\hangindent 30pt \hh   Fill in examples of Self-Insert.

\hangindent 30pt \hh    Check examples of  Self-Insert. Nothing special  found. 

\hangindent 30pt \hh   Fill in examples of  Coa-repeat2-Add2. 

\hangindent 30pt \hh   Check examples of  Coa-repeat2-Add2. It's the same as TIMES.

\hangindent 30pt \hh     Apply  algorithm   for  Invert   to  argument   `Squaring'.  Define
``Square-root.''

\hangindent 30pt \hh   Fill in examples of Square-root. Some found, but very inefficiently.

\hangindent 30pt \hh   Fill in new algorithms for Square-root. Had to ask user for a good one.

\hangindent 30pt \hh   Check examples of Square-root. Define the set of numbers
``Perfect-squares.''

\hangindent 30pt \hh   Fill in examples of Coa-repeat2-Times. This is exponentiation.

\hangindent 30pt \hh   Check  examples  of Coa-repeat2-Times.  Nothing  special  noticed,
unfortunately.

\hangindent 30pt \hh   Fill in examples of Inv-TIMES.
Many found, but inefficiently.

\hangindent 30pt \hh   Fill in new algorithms for Inv-TIMES. Obtained opaquely from the user.

\hangindent 30pt \hh   Check examples of Inv-TIMES. This task suggests the next one:

\hangindent 30pt \hh   Compose G-Union with  Inv-TIMES. Good Domain/range. Renamed
``Divisors.''

\hangindent 30pt \hh   Fill in examples of Divisors. Many found, but not very efficiently.

\hangindent 30pt \hh   Fill in new algorithms for Divisors. Obtained from the user.

\hangindent 30pt \hh   Fill in examples of Perfect-squares. Many found.

\hangindent 30pt \hh   Fill in specializations of TIMES. 
Times1(x) $=↓{df} 1\cdot x$, Times2(x) $=↓{df} 2\cdot x$,
Times-sq is TIMES with its domain restricted to bags of perfect squares, Times-ev
takes only even arguments,
Times-to-evens requires that the result be even, Times-to-sq, $\cdots $

\hangindent 30pt \hh   Check examples of Divisors.
Define 0-Div, 1-Div, 2-Div, and 3-Div, the sets of numbers
whose Divisors value is the empty set, a singleton, a doubleton, and a tripleton,
respectively. 

\hangindent 30pt \hh   Fill in examples of 1-Div. 
Only one example found: ``1.'' 
Lower 1-Div.Worth.

\hangindent 30pt \hh   Fill in examples of 0-Div.
None found. Lower the worth of this concept.

\hangindent 30pt \hh   Fill in examples of 2-Div.
A nice number of examples are found. Raise 2-Div.Worth.

\hangindent 30pt \hh   Check examples of 2-Div. All confirmed,
but no pattern noticed.

\hangindent 30pt \hh   Fill in examples of 3-Div. A nice number found.

\hangindent 30pt \hh   Check examples of 3-Div. All confirmed. All are perfect squares.

\hangindent 30pt \hh   Restrict Square-root to numbers which are in 3-Div. Call this Root3.

\hangindent 30pt \hh   Fill in examples of Root3. Many found.

\hangindent 30pt \hh   Check examples of Root3. All confirmed. All are in 2-Div. 
Raise their worths.

\hangindent 30pt \hh   Restrict Squaring to 2-divs. Call the result Square2.

\hangindent 30pt \hh   Fill in examples of Square2. Many found.

\hangindent 30pt \hh   Check the range of Square2. Always 3-Divs.
Conjecture: x has 2 divisors iff x$↑2$ has 3 divisors.

\hangindent 30pt \hh   Restrict Squaring to 3-Divs. Call the result Square3.

\hangindent 30pt \hh   Restrict Square-rooting to 2-Divs. Call the result Root2.

\hangindent 30pt \hh   Fill in examples of Square3. Many found.

\hangindent 30pt \hh   Compose Divisors-of and Square3. Call the result Div-Sq3.

\hangindent 30pt \hh   Fill in examples of Div-Sq3. Many found.

\hangindent 30pt \hh   Check examples of Div-Sq3.
All such examples are Same-size.


\hangindent 30pt \hh--{\6175. } More confirmations and explorations of the above conjecture.
Gradually, all its ramifications lead to dead-ends (as far as {\AM} is concerned).

\setcount8 175

\hangindent 30pt \hh   Fill in examples of Root2. None found.
Conjecture that there are none.

\hangindent 30pt \hh   Check examples of Inv-TIMES. Inv-TIMES always contains a singleton bag, and
always contains a bag of primes. 

\hangindent 30pt \hh   Restrict the range of Inv-TIMES to bags of primes. Call this
Prime-Times.

\hangindent 30pt \hh   Restrict the range of Inv-TIMES to singletons. Called
Single-Times.

\hangindent 30pt \hh   Fill in examples of Prime-times. Many found.

\hangindent 30pt \hh   Check examples of Prime-times. Always
a singleton set. 
User renames this conjecture {\it ``The unique factorization theorem.''}

\hangindent 30pt \hh   Fill in examples of Single-TIMES. Many found.

\hangindent 30pt \hh   Check examples of Single-TIMES. Always
a singleton set.
Single-TIMES is actually the same as Bag-insert!

\hangindent 30pt \hh   Fill in examples of Self-set-union. Many found.

\hangindent 30pt \hh   Check examples of Self-set-union. This operation is same as 
Identity.

\hangindent 30pt \hh   Fill in examples of Self-bag-union. Many found.

\hangindent 30pt \hh   Check examples of Self-bag-union. Confirmed. Nothing interesting noticed.

\hangindent 30pt \hh   Fill in examples of Inv-ADD.

\hangindent 30pt \hh   Check examples of Inv-ADD. Hordes of boring conjectures, so:

\hangindent 30pt \hh   Restrict the domain of Inv-ADD to primes (Inv-Add-primes), to evens
(Inv-Add-evens), to squares, etc.

\hangindent 30pt \hh   Fill in examples of Inv-add-primes. Many found.

\hangindent 30pt \hh   Check examples of Inv-add-primes. Confirmed, but nothing special noticed.

\hangindent 30pt \hh   Fill in examples of Inv-add-evens. Many found.

\hangindent 30pt \hh   Check examples of Inv-add-evens. Always contains
a bag of primes.

\hangindent 30pt \hh   Restrict the range of Inv-Add-evens to bags of primes. Called
Prime-ADD.

\hangindent 30pt \hh   Restrict the range of Inv-ADD to singletons. Call that new operation
Single-ADD.

\hangindent 30pt \hh   Fill in examples of Prime-ADD. Many found.

\hangindent 30pt \hh   Check examples of Prime-ADD. Always
a nonempty set (of bags of primes). 
User renames this conjecture {\it ``Goldbach's conjecture.''}

\hangindent 30pt \hh   Fill in examples of Single-ADD. Many found.

\hangindent 30pt \hh   Check examples of Single-ADD. Always
a singleton set. This operation is the same as
Bag-insert and Single-TIMES.

\hangindent 30pt \hh   Restrict the range of Prime-ADD to singletons, by analogy to
\ffootnote{Prime-TIMES.}{In this case, {\AM} is asking which numbers are uniquely representable
as the sum of two primes.} Call the new operation Prime-ADD-SING.

\hangindent 30pt \hh   Fill in examples of Prime-ADD-SING. Many found.


\hangindent 30pt \hh   Check examples of  Prime-ADD-SING. Nothing special noticed.

\hangindent 30pt \hh   Fill in examples of \footnote{Times-sq.}{Recall that this is just TIMES restricted
to operate on perfect squares. } Many examples found.

\hangindent 30pt \hh   Check Domain/range of Times-sq. Is the range actually Perfect-squares?
Yes!

\hangindent 30pt \hh   Fill in examples of Times1. 
Recall that Times1(x) $=↓{df} $ TIMES(1,x) = 1$\cdot $x = x.

\hangindent 30pt \hh   Check examples of Times1. 
Apparently just a restriction of Identity.

\hangindent 30pt \hh   Check examples of Times-sq. Confirmed.

\hangindent 30pt \hh   Fill in examples of Times0. 

\hangindent 30pt \hh   Fill in examples of Times2.

\hangindent 30pt \hh   Check examples of Times2. Apparently the same as Doubling.
That is, $x+x=2\cdot x$. Very important.
By analogy, define Ad2(x) as x+2.

\hangindent 30pt \hh   Fill in examples of Ad2.

\hangindent 30pt \hh   Check examples of Ad2. Nothing interesting noticed.

\hangindent 30pt \hh   Fill in specializations of Add. Among those created are:
Add0 (x+0), Add1, Add3, ADD-sq (addition restricted to perfect squares),
Add-ev (sum of even numbers), Add-pr (sum of primes), etc.

\hangindent 30pt \hh   Check examples of Times0. The value always seems to be 0.

\hangindent 30pt \hh   Fill in examples of \footnote{Times-ev.}{Recall 
that Times-ev is just like
TIMES restricted to operating on even numbers. } Many examples found.

\hangindent 30pt \hh   Check examples of Times-ev. Apparently all the results are Evens.

\hangindent 30pt \hh   Fill in examples of \footnote{Times-to-ev.}{That is, consider bags of numbers
which multiply to give an even number. } Many found.

\hangindent 30pt \hh   Fill in examples of Times-to-sq. Only a few found.

\hangindent 30pt \hh   Check examples of Times-to-sq. All arguments always seem to be squares.
Conjec: Times-to-sq is really the same as Times-sq. Merge the two.
This is a false conjecture, but did {\AM} no harm.

\hangindent 30pt \hh   Check examples of Times-to-ev. The domain always contains an even number.

\hangindent 30pt \hh   Fill in examples of Self-Union.

\hangindent 30pt \hh    Check examples of  Self-Union. 

\hangindent 30pt \hh   Fill in examples of SubSet.

\hangindent 30pt \hh   Check example  of SubSet.

\hangindent 30pt \hh   Fill in examples of SuperSet.


\hangindent 30pt \hh   Check examples of SuperSet. 
Conjec: Subset(x,y) iff Superset(y,x). Important.

\hangindent 30pt \hh   Fill in examples of Compose$\circ$Compose-1. {\AM} creates
some explosive combination  (\eg, 
(Compose$\circ$Compose)$\circ$(Compose$\circ$Compose)$\circ$(Compose$\circ$Compose)),
some poor ones (\eg, Square$\circ$Count$\circ$ADD\inv),
and even a few---very few---winners (\eg, 
SUB1$\circ$Count$\circ$Self-Insert, which is later recognized to coincide with
Count).

\hangindent 30pt \hh   Check examples of Compose$\circ$Compose-1.

\hangindent 30pt \hh   Fill in examples of Compose$\circ$\ffootnote{Compose-2.}{Recall that the difference
between this operation and the last one is merely in the order of the composing:
F$\circ$(G$\circ$H) versus (F$\circ$G)$\circ$H. } 
{\AM} recreates many of the previous tasks' operations.

\hangindent 30pt \hh   Check examples of Compose$\circ$Compose-2. 
Nothing noticed \footnote{yet.}{Later on,
{\AM} will use these new operations to discover the associativity of Compose. }


\hangindent 30pt \hh--{\6252. }  Fill  in  and check  examples  of  the  losing compositions just
created.

\setcount8 252

\hangindent 30pt \hh   Fill in examples of Add-sq (\ie, sum of squares).

\hangindent 30pt \hh   Check Domain/range entries of Add-sq. The range is not always perfect squares.
Define Add-sq-sq(x,y),
which is True iff x and y are perfect squares and their sum is a perfect
square as well.

\hangindent 30pt \hh   Fill in examples of Add-pr; \ie, addition of primes.

\hangindent 30pt \hh   Check Domain/range entries of Add-pr. {\AM} defines the set of pairs of primes
whose sum is also a prime. This is a bizarre derivation of prime pairs.

\tenpoint \rm \parindent 20pt \parskip 0pt }

\subsectionbegin[1.2]{Two-Dimensional Behavior Graph}
On the next two pages is a graph of the same ``best run'' which {\AM} executed.
The nodes are concepts, and the links are actions which {\AM} performed.
Labels on the links indicate when each action was taken, so the reader
may observe how {\AM} jumped around. It should also be
easy to perceive from the graph
which paths of development were abandoned, which concepts ignored, and which ones
concentrated upon. These are precisely the features of {\AM}'s behavior which are
awkward to infer from a simple linear trace (as in the previous section).

In more detail, here is how to read the graph:
Each node is a concept. 
To save space, these names are often highly abbreviated. For example,
``x0'' is used in place of ``TIMES-0.'' 

Each concept name is surrounded by from zero to four numbers:

\han6{\6 318  \hskip 25pt 288}

\han6{\bf  \ WIDGETING}

\han6{\6 310  \hskip 25pt 291}

\noindent The upper right number indicates the task number (see last section) during which
examples of this concept were filled in. The lower right number tells when they
were checked. 
The upper left number indicates when the Domain/range facet of that
concept was modified. Finally, the lower left number is the task number during
which some new Algorithms for that concept were obtained.
A number in parentheses  indicates that the task with that number was a total
failure.

Because of the limited space, it was decided that if a concept were ever
renamed by the user, then only that newer, mnemonic name would be given in
the diagram. Thus there is an arrow from ``Coalesce'' to ``\4Square\1,'' an operation
originally called ``Self-Times'' by {\AM}.

Sometimes, a concept will have under it a note of the form \6$≡$
GROK\1.
This simply means that {\AM} eventually discovered that the concept was
equivalent to the already-known concept ``GROK,'' and probably forgot about
this one (merged it into the one it already knew about).
The ``trail'' of discovery may pick up again at that preexisting concept.
A node written as \6=GROK\0 means that the concept was really the same as
``GROK,'' but {\AM} never investigated it enough to notice this.

The arrows indicate the creation of new concepts. Thus an arrow leading
to concept ``Widgeting'' indicates how that concept was created. An arrow directed
away from Widgeting 
points to a concept created as, \eg, a specialization or an
example of Widgeting. No arrowheads are 
in practice necessary:
all arrows are directed \4downward\1.

The arrows may be labeled, indicating precisely what they represent (\eg,
composition, restriction) and what the task number was when they occurred.
For space reasons, the following convention has proven necessary: 
if an arrow emanating from C is unnumbered, 
it is assumed to have occurred at the same time
as the arrow to its immediate left which also points from C;
if all the arrows
emanating from C have no number, than all their times of occurrence are
assumed to be the \ffootnote{{\sl lower right}}{This is often
true because many concepts are created
while checking examples of some known concept.}
number of C.  
Finally, if C has no lower right number, the arrow is assumed to have the
value of the upper right number of C.

An unlabeled arrow is assumed to be an act of
Specialization or the creation of
an \footnote{Example.}{It
should be clear in each context which is happening. If not, refer to the
short trace in the preceding section, and    look up the appropriate task number.}
Labels, when  they do occur, are given in capitals and small letters;
concept names (nodes) are by contrast in all capitals.
All the numbers correspond to those given to the tasks
in the task-by-task traces presented
in the last section.


The first part of this graph (presented below) contains static  structural
(and ultimately numerical) concepts which were studied by {\AM}:

\vfill

TO THE PRINTER:  PLEASE PASTE THE SMALL GRAPH FIGURE IN HERE!

\vfill

The rest of the graph (presented on the next page) deals with activities which were
investigated:

\newpage

\vfill

TO THE PRINTER:  PLEASE PASTE THE LARGE GRAPH FIGURE IN HERE!

\vfill

\eject


\subsectionbegin[1.3]{AM as a Computer Program}
When viewed as a large LISP program, there is very little of interest about
{\AM}. There are the usual battery of customized functions (\eg, a conditional
PRINT function), the storage hacks (special emergency garbage collection
routines, which know which facets are expendible), the time hacks
(omnisciently arrange clauses in a conjunction so that the one most likely
to fail will come first), and the bugs (if the user renames a concept
while it's the current one being worked on, 
there is a 5\%\ chance of {\AM} entering an infinite loop).

Below are listed a few parameters of the system, although I doubt that
they hold any theoretical significance. The reader may be curious about how
big {\AM} is, how long it takes to execute, etc.



Machine: SUMEX, PDP-10, KI-10 uniprocessor, 256k core memory.

Language: Interlisp, January 1975 release, 
which occupies 140k of the total 256k, 
but which provides a surplus  ``shadow space''
of 256k additional words available for holding compiled code.

{\AM} support code: 
200 compiled (not block-compiled) utility routines, control routines, etc.
They occupy roughly 100k, but all are pushed into the shadow space.

{\AM} itself: 115 concepts, each occupying about .7k
(about two typed pages, when Pretty-printed with indentation).
Facet/entries stored as
property/value on the property list of atoms whose names are concepts'
\ffootnote{names.}{Snazzy feature:
Executable entries on facets  (\eg, an entry on Union.Alg) are stored uncompiled
until the first time they are actually called on, at which time they are compiled
and then executed. }
Each concept has about 8 facets filled in.

Heuristics are tacked onto the facets of the concepts. The more general
the concept, the more heuristic rules it has attached to \footnote{it.}{This
was not done consciously,
and may or may not hold some theoretical significance. }
``Any-concept'' has 121 rules; ``Active concept'' has 24; ``Coalesce'' has 7;
``Set-Insertion'' has none. There are 250 heuristic rules in all, divided into
four flavors (Fillin, Check, Suggest, Interestingness).
Although the mean number of rules is therefore only about 2.2 (\ie, 
less than 1 of each flavor) per
concept,
the standard deviation of this is a whopping 127.4. The average number of heuristics
(of a given flavor)
encountered rippling upward from a randomly-chosen concept C (along the
network of generalization links) is about 35, even though  the mean path length
is only about \footnote{4.}{If the heuristics were homogeneously distributed among the
concepts, the number of heuristics (of a given type) along a typical path 
of length 4 would only be about 2, not 35.  If all the heuristics were tacked
onto Anything and Any-concept, the number encountered in any path would be
75, not 35. } 

The total number of jobs executed in a typical run (from scratch) is about 200.
The run ends because of space problems, but {\AM}'s performance begins to degrade
near the end anyway.

``Final'' state of {\AM}: 300 concepts, each occupying about 1k. Many are swapped out
onto disk. 
Number of winning concepts discovered: 25 (estimated).
Number of acceptable concepts defined: 100 \ffootnote{(est.).}{For a list of
``winners'' and ``acceptables,'' see the final section in Appendix 1. }
Number of losing concepts unfortunately worked on: 60 (est.).
The original 115 concepts have grown to an average size of 2k.
Each concept has about 11 facets filled in.

About 30 seconds of cpu time were allocated to each task, on the average,
but the task typically used only about 18 seconds before quitting.
Total CPU time for a run is about 1 hour. Total cpu time consumed by this
research project was about 500 cpu hours.

Real time: about 1 minute per task, 2 hours per run. 
The idea for {\AM} was formulated in the fall of 1974, and {\AM} was coded in the
summer of 1975.
Total time consumed
by this project to date has been about 2600 man-hours: 700 for planning,
500 for coding, 600 for modifying and debugging and experimenting,
700 for writing the thesis, and 100 for editing it into book form.

{\AM} was working by itself:
it  received no  help  from the  user, and  all its
concepts' Intuitions facets had been removed.

\sectionskip
\sectionbegin[2]{Experiments with AM}
One valuable aspect of  {\AM} is  that it  is amenable  to many  kind of
interesting experiments.  Although {\AM} is too {\it ad hoc} for numerical
results to have much significance, the qualitative results perhaps do
have  some   valid  things  to  say  about   research  in  elementary
mathematics, about  automating  research,  and  at  least  about  the
efficacy of various parts of {\AM}'s design.

This section will explain  what it means to perform  an experiment on
{\AM},  what kinds  of experiments  are imaginable,  which of  those are
feasible, and  finally will  describe the many experiments which  were
performed on {\AM}.

By modifying {\AM} in various ways, its behavior can be altered, and the
\4quality\0  of  its  behavior  will change  as  well.  As  a drastic
example, one experiment involved forcing {\AM} to select the next task to work on
\4randomly\0 from the agenda, not the top task each time. Needless to say,
the performance was very different from usual.

By careful planning, each experiment can tell us something new  about
{\AM}: how  valuable a  certain piece  of it  is, how  robust a  certain
scheme  really is, etc.   The results of these  experiments would then
have  something  to   contribute  to  a   discussion  of  the   ``real
intelligence''  of  {\AM}  (\eg,  what features  were  superfluous),  and
contribute to  the design of the ``next'' {\AM}-like system.  Generalizing
from those  results,  one might  suggest  some hypotheses  about  the
larger task of automated math research.

Let's cover the different  \4kinds\0 of experiments one could perform
on {\AM}:

\listbegin

\numlistentry[1]{Remove individual  concept modules,  and/or individual  heuristic
rules. Then  examine how  {\AM}'s performance  is degraded.   {\AM}  should
operate even  if most of its heuristic rules  and most of its concept
modules were excised.
If the remaining  fragment of {\AM} is too small, however,
it may  not be able to find anything interesting to do. In fact, this
situation was actually encountered experimentally, when the first few
partially complete concepts were inserted. If only a little bit of {\AM}
is removed, the remainder  will in fact  keep operating without  this
``uninteresting collapse.''   The converse situation should  also hold:
although  still functional  with any  concept module  unplugged, {\AM}'s
performance \4should\0  be  noticeably degraded.  That is,  while  not
indispensable, each concept should nontrivially help the others.  The
same  holds for each individual heuristic  rule.  
When a piece of {\AM} is removed,
which concepts does
{\AM} then ``miss'' discovering?  Is the  removed concept/heuristic  later
discovered  anyway  by those  which  are left  in  {\AM}?   This  should
indicate the importance  of each kind  of concept and  rule with which  {\AM}
starts.}

\numlistentry[2]{Vary  the relative  weights given  to features  by the  criteria
which  judge aesthetics, interestingness,  worth, utility,  etc.  See
how important each factor is in directing {\AM} along successful routes.
In other  words, vary the  little numbers  in the formulae  (both the
global priority-assigning formula and  the local  
reason-rating
ones inside  heuristic rules).   One
important result will be  some idea of the robustness  or ``toughness''
of the numeric weighting  factors. If the system easily collapses, it
was too finely tuned initially.}

\numlistentry[3]{Add several new concept modules (including new heuristics
relevant to them) and
see if {\AM} can  work in some unanticipated field  of mathematics (like
graph theory or calculus or plane geometry).
Do earlier achievements---concepts and conjectures {\AM}
synthesized already---have  any
impact in  the new domain?  Are some specialized heuristics  from the
first  domain totally wrong  here? Do all the  old general heuristics
still  hold  here?  Are  they  sufficient,  or   are  some  ``general''
heuristics  needed here which  weren't needed  before? Does  {\AM} ``slow
down'' as more and more concepts get introduced?}

\numlistentry[4]{Try to have {\AM} 
develop nonmathematical theories  (like elementary physics, or program
verification).  This  might  require  limiting  {\AM}'s  freedom to
``ignore  a  given  body  of  data  and  move  on  to  something  more
interesting.'' The exploration of  very non-formalizable fields (\eg,
politics) might require much more than a small augmentation of {\AM}'s
base of concepts. For some such domains, the ``Intuitions'' scheme, which had to
be abandoned for math, might prove valid and valuable.}

\numlistentry[5]{Add several  new concepts dealing with  proof, and of course  add
all the associated heuristic rules. Such rules would advise {\AM} on the
fine points of  using various techniques  of proof/disproof: when  to
use them, what to try next based on why the last attempt failed, etc.
See if the \4kinds\0 of discoveries {\AM} makes are increased.}

\listend

Several experiments (of  types 1, 2, and 3 above) were
set up and performed on  {\AM}. We're now ready to examine each  of them
in detail.  The following points are covered for each experiment:

\listbegin

\numlistentry[1]{How was it thought of?}

\numlistentry[2]{What will be gained by  it?  What would be the implications of the
various possible outcomes?}

\numlistentry[3]{How was the experiment set up? What preparations/modifications had
to be made? How much time (man-hours) did it take?}

\numlistentry[4]{What happened?  How did {\AM}'s  behavior change? Was  this expected?
Analysis.}

\numlistentry[5]{What was learned from  this experiment? Can  we conclude anything
which suggests new  experiments (\eg,  use a better  machine, a  new
domain) or which bears on a  more general problem that {\AM} faced (\eg,
a new way to teach math, a new idea about doing math research)?}

\listend
\subsectionbegin[2.1]{Must the Worth numbers be finely tuned?}
Each of the 115 initial concepts 
has  supplied to it (by  the author) a number  between 0
and 1000,  stored as its Worth facet,  which is supposed to represent
the overall  value of  the concept.  ``Compose'' has  a higher  initial
Worth than ``Structure-delete,'' which is  higher than \ffootnote{``Equality''.}{As {\AM}
progresses, it notices something interesting about Equality every now
and then, and pushes its  Worth value upwards.}

Frequently, the priority of a task involving C depends on the overall
Worth of C. 
How  sensitive is
{\AM}'s  behavior to  the initial  settings of  the Worth  facets?   How
finely tuned must these initial Worth values be?

This experiment was thought of because of the ``brittleness'' of many other
AI systems, the amount of fine tuning needed to elicit coherent behavior.
For example, see the discussion of limitations of PUP6, in  Lenat[75b].
The author believed that {\AM} was very resilient in this regard, and that
a demonstration of that fact would increase credibility in the power of the ideas
which {\AM} embodies.


To test this, a simple experiment was performed. Just before starting
{\AM}, the mean value of all concepts' Worth values was computed. It turned out to be
roughly 200. Then  each concept had its Worth reset to  the value \footnote{200.}{The
initial spread of values had previously been from 100 to 600. }
This
was done ``by hand,'' by  the author, in a  matter of seconds.  {\AM}  was
then started and run as if  there were nothing amiss, and its behavior
was watched carefully.

What happened?  By and large, the same major
discoveries were made---and missed---as usual,
in  the same order as usual.  But  whereas {\AM}
proceeded fairly smoothly before, with little superfluous activity, it
now wandered quite blindly  for long periods  of time, especially  at
the  very beginning.  Once  {\AM} ``hooked  into''  a line  of  productive
development,  it  followed  it  just  as  always, with  no  noticeable
additional wanderings.  As one of  these lines  of developments  died
out, {\AM} would wander around again, until the next one was begun.

It took roughly three times as long for each major discovery to occur
as  normal.  This ``delay''  got shorter  and  shorter as  {\AM} developed
further.   In  each  case,  the  tasks preceding  the  discovery  and
following  it were pretty  much the  same as  normal; only  the tasks
``between'' two periods of development were different---and much  more
numerous.  

The  reader may be interested to learn  that the Worth values of many
of the  concepts---and most of the new concepts---ended up  very
close  to the same  values that  they achieved  in the  original run.
Overrated concepts were  investigated and  proved boring;  underrated
concepts had  to wait longer  for their  chances, but then quickly  proved
interesting and had their Worth facets boosted.

The conclusion I  draw from this change in behavior is that the Worth
facets are useful for making blind decisions---where {\AM}  must choose
based  only on  the overall  worths of  the various  concepts in  its
repertoire.  Whenever  a specific  reason  existed, it  was  far more
influential than  the ``erroneous''  Worth values.   The close,  blind,
random decisions occur  between long bursts of specific-reason-driven
periods of creative \footnote{work.}{Incidentally, GPS behaved just this same way.
See, \eg,  Newell&Simon[72]. }

The general answer, then, is \4No\1, the initial settings of the Worth
values are not crucial. Guessing reasonable initial values for them is
merely a time-saving device.
This suggests
an interesting research problem: what impact does
the \4quality\0 of initial starting values have on humans? Give several bright
undergraduate math majors the same set of objects and operators to play with,
but tell some of them ({\it i\/}) nothing, and some of them ({\it ii\/}) a certain few pieces of
the system are very promising, then ({\it iii\/}) 
emphasize a different subset of the objects and operators.
How does ``misinformation'' impede the humans? How about no information?
Have them give verbal protocols about where they are focussing their attention,
and why.

Albeit at a  nontrivial cost, the  Worth facets did manage  to correct
themselves  by the  end of  a \footnote{long}{A couple  cpu hours,  about a
thousand tasks  total selected from  the agenda.} run.   What  would
happen  if the  Worth  facets of  those 115  concepts  were not  only
initialized  to 200, but were  held fixed at 200  for the duration of
the run?

In this  case, the  delay still subsided with time.  That is,  {\AM}
still got more and more ``back to normal'' as it progressed onward. The reason
is  because  {\AM}'s  later  work  dealt  with  concepts  like   Primes,
Square-root, etc., which were so far removed  from the initial base of
concepts that the initial concepts' Worths were of little consequence.

Even  more drastically, we  could force all  the Worth  facets of all
concepts---even newly created ones---to be  kept at the value  200
forever. In this case,  {\AM}'s behavior doesn't completely disintegrate,
but  that delay factor  actually increases with  time: apparently, {\AM}
begins to suffer from the exponential growth of ``things to do'' as its
repertoire  of  concepts  grows  linearly.   Its  purposiveness,  its
directionality depends on ``focus of attention'' more and more, and  if
that feature is removed,  {\AM} loses much of its rationality.   A factor
of 5  delay doesn't sound that bad  ``efficiency-wise,'' but the actual
apparent  behavior of  {\AM}  is  as  staccato  bursts  of  development,
followed  by wild  leaps  to  unrelated concepts.  {\AM}  no longer  can
``permanently'' record its interest in a certain concept.

So  we conclude that the  Worth facets are ({\it i\/})  not finely tuned, yet
({\it ii\/}) provide important  global information about the  relative values
of  concepts.   If  the  Worth facets  are  completely disabled,  the
rationality of {\AM}'s behavior hangs on the slender thread of ``focus of
attention.''

\subsectionbegin[2.2]{How Finely Tuned is the Agenda?}
The top few candidates  on the agenda always appear  to be reasonable
(to me).   If I work with  the system, guiding it, I  can cause it to
make a few discoveries it wouldn't otherwise make, and I can cause it
to make its typical ones much faster  (by about a factor of two). Thus the
\4very\0 top task is not always the best.

If  {\AM} randomly selects one of  the top 20 or  so tasks on the agenda
each time, what  will happen to  its behavior? Will it  disintegrate,
slow down by a factor of ten, slow down slightly, etc.

This experiment required only a few seconds to set up, but demanded a
familiarity with  the  LISP  functions which  make  up  {\AM}'s  control
structure.   At  a  certain  point,  {\AM} asks  for  Best-task(Agenda).
Typically,  the LISP function  Best-task is  defined as CAR---\ie,
pick the first  member from the  list of tasks.   What  I did was  to
redefine Best-task as  a function which randomly selected  n from the
set  $\{1,2,\cdots ,20\}$,  and  then  returned  the  n$↑{th}$  member of  the
job-list.

If you watch the top job on the agenda, it will  take about 10 cycles
until  {\AM} chooses  it.   And yet  there are  many  good, interesting,
worthwhile jobs sprinkled  among the top  20 on the  agenda, so  {\AM}'s
performance is cut  by merely a factor of  three, as far as cpu  time per
given major  discovery.  Part of this  better-than-20 behavior is due
to the fact  that the eighteenth  best task had  a much lower  priority
rating than the  top few; hence it was allocated much  less cpu time for
its  quantum  than the  top  task would  have received.    Whether it
succeeded or  failed, it  used up  very little  time.   Since {\AM}  was
frequently  working on a  low-value task,  it was unwilling  to spend
much time or space on it. So the mean time allotted per task fell  to
about 15 seconds (from the typical 30  secs). Thus, the ``losers'' were
dealt  with quickly,  so the  detriment  to cpu-time  performance was
softened.


To carry this  investigation further, another  experiment was  carried
out. {\AM} was forced to alternate between choosing  the top task on the
agenda,  and a randomly chosen one.   Although its  rate of discovery
was cut by less than half, its behavior was almost as  distasteful to
the user as in the last (always-random) experiment.


{\bf Conclusion:} Selecting (on the  average) the tenth-best candidate
impedes pro\-gress  by a factor of less than ten (about a factor of three), but
it dramatically  degrades the  ``sensibleness'' of  {\AM}'s behavior,  the
continuity  of its actions.   Humans  place a  big value  on absolute
sensibleness, and believe that doing something silly 50\%\  of the  time
is \4much\0  worse than half  as productive  as always doing  the
next most logical task.

{\bf Corollary:} Having  20 multi-processors simultaneously execute the top
20 jobs will  increase the rate  of ``big'' discoveries,
but not by  a
full factor of twenty---more like a factor of three.

Another experiment in this same vein was done, one which was designed
to  be far more crippling to {\AM}.  Be-threshhold was held at 0 always,
so \4any\0  task which  ever got  proposed  was kept  forever on  the
agenda, no matter  how low its priority.   The Best-task function was
modified so it randomly selected any member of the list of jobs. As a
final insult, the Worth  facets of all the concepts  were initialized
to 200 before starting {\AM}.

{\bf Result:}  Many ``explosive'' tasks  were chosen,  and the number  of new
concepts increased rapidly.   As  expected, most of  these were  real
``losers.''  There  seemed no rationality to {\AM}'s  sequence of actions,
and  it was  quite  boring to  watch it  floundering so.  The typical
length of the agenda was about 500, and {\AM}'s performance was ``slowed''
by at least  a couple orders of magnitude.  A more subjective measure
of its ``intelligence'' would say that it totally collapsed under  this
random scheme.

{\2Conclusion:\0}  Having   an  unlimited   number  of   processors
simultaneously execute all  the jobs on the agenda would increase the
rate at which {\AM} made  big discoveries, at an ever-accelerating  pace
(since the length of the agenda would grow exponentially).

Having a uniprocessor \4simulate\0  such parallel processing would be
a losing  idea, however. The truly ``intelligent'' behavior {\AM} exhibits
is its plausible sequencing of tasks.

\subsectionbegin[2.3]{How Valuable is Tacking Reasons onto Each Task?}
Let's dig inside  the agenda  scheme now.   One idea I've repeatedly emphasized
is the attaching of reasons to  the tasks on the agenda,
and using  those reasons  and their  ratings to  compute the  overall
priority value  assigned to each  task.  An  experiment was  done to
ascertain  the amount  of intelligence that  was emanating  from that
idea.

The global  formula  assigning  a  priority value  to  each  job  was
modified. We let it  still be a function of the  reasons for the job,
but we ``trivialized'' it: the priority of a job was computed as simply
the number of reasons it  has (normalized by multiplying by  100, and
cut-off if over 1000).

This raised the  new question of what to do  if several jobs all have
the same  priority.   In that case, I had {\AM} execute  them  in
stack-order (most  recent job tackled \ffootnote{first).}{Why?  Because
({\it i\/}) it  sounds right
intuitively to me, ({\it ii\/}) this is akin to human focus of attention, and
mainly because  ({\it iii\/})  this is  what {\AM}  did  anyway, with  no  extra
modification. }

{\bf Result:}  I  secretly  expected  that  this  wouldn't  make  too  much
difference  on {\AM}'s  apparent level  of directionality, but  such was
definitely not the case.   While {\AM} started by doing tasks  which were
far more interesting and  daring than usual (\eg, filling in various
Coalescings right away),  it soon  became obvious that  {\AM} was  being
swayed by hitherto trivial coding decisions.   Whole classes of
tasks---like Checking Examples  of C---were never  chosen, because they
only had one or two reasons supporting them.  Previously, one  or two
good reasons  were sufficient. Now,  tasks with several  poor reasons
were  rising to the  top and being  worked on. Even  the LIFO (stack)
policy for resolving ties didn't keep {\AM}'s attention focussed.

\2Conclusion:\0 Unless a conscious effort is made  to ensure that
each  reason really will  carry roughly  an equal amount  of semantic
impact (charge, weight), it is not acceptable merely to choose  tasks
on  the  basis of  how  many  reasons  they possess.  Even  in  those
constricted  equal-weight  cases,  the  similarities between  reasons
supporting a task should be taken into account.


\subsectionbegin[2.4]{What If Certain Concepts Are Eliminated or Added?}
Feeling  in  a  reckless mood  one  day,  I  eliminated  the  concept
``Equality'' from {\AM} to see what it would then do.  Equality was a key
concept,  because  {\AM}  discovered   Numbers  via  the  technique   of
generalizing  the  relation ``Equality''  (exact  equality  of 2  given
structures,  at  all  internal  levels).   What  would  happen  if we
eliminate this  path?  Will  {\AM} rederive  Equality?   Will it get  to
Cardinality via another route? Will it do some set-theoretic things?

{\bf Result:}  Rather disappointing. {\AM}  never did rederive  Equality or
Car\-di\-na\-li\-ty.  It spent its time thrashing about with various  flavors
of data-structures (unordered \vs\  ordered, multiple-elements allowed
or  not, etc.),  deriving large  quantities  of boring  results about
them. Very many composings and coalescings were done, but no exciting
new operations were produced.


A kinder type of  experiment would be  to \4add\0 a  few concepts.
One  such  experiment was  done: the  addition  of Cartesian-product.
This operation,  named  C-PROD, accepts  two  sets as  arguments  and
returns a third set as its  value: the Cartesian product of the first
two.

{\bf Result:}  The only significant change in  {\AM}'s behavior was that TIMES
was discovered first as the restriction of  C-PROD to Canonical-Bags.
When it  soon was rediscovered in  a few other guises,  its Worth was
even higher than usual.  {\AM}  spent even more time exploring  concepts
concerned with it, and deviated much less for quite a long time.

{\bf Synthesis of the above experiments:} It appears  that {\AM} may really be
more specialized than expected; {\AM} may be able to forge  ahead 
only along one or two
main lines  of development---at  least if  we demand  it make  very
interesting,  well-known  discoveries  quite  frequently.    Removing
certain  key concepts  can be disastrous.  On the  other hand, adding
some carefully chosen  new  ones  can  greatly  enhance   {\AM}'s
directionality (hence its apparent intelligence).

{\bf Conclusion:}  In its current state,
{\AM} is thus  seen to  be \4minimally competent:\0  if any
knowledge is  removed,  it appears much less intelligent;  if any  is  added,  it
appears slightly smarter.

{\bf Suggestion  for future research:}
A hypothesis, which should be tested experimentally, is that the importance
of the presence of each individual 
concept  decreases as the number of---and \4depth\0 of---the 
synthesized concepts increases.
That is, any excision would eventually ``heal over,'' given enough time.
The failure of  {\AM} to verify this may be due to the relatively small amount of
development in toto (an hour of cpu time, a couple hundred new concepts, a few
levels deeper than the starting ones).

\subsectionbegin[2.5]{Can AM Work in a New Domain: Plane Geometry?}\par

\epigraph{A true strategy should be domain-independent.}
\author{Adams}
\epigraphend

\noindent As McDermott  points  out [McDermott 76],  just labelling  a
bunch of  heuristics ``Operation heuristics'' doesn't suddenly make them
relevant to any operation; all it  does is give that impression to  a
human who  looks at  the code (or  a description of  it).   Since the
author  hoped that the  labelling really was fair,  an experiment was
done to test this.   Such an experiment would be a key  determiner of
how general {\AM} is.

How  might one  demonstrate  that the  ``Operation''  heuristics really
could be useful  or dealing  with any  operation, not  just the  ones
already in {\AM}'s initial base of concepts?

One  way would  be  to  pick a  new  domain,  and  see how  many  old
heuristics  contribute to---and  how many new heuristics  have to be
added to elicit---some  sophisticated behavior in  that domain.  Of
course,  some new  primitive  concepts would  have  to be  introduced
(defined) to {\AM}.

Only  one experiment of this type was  attempted.  The author added a
new base  of concepts  to the  ones already  in {\AM}.   Included  were:
Point, Line, Angle, Triangle, Equality of points/lines/angles/triangles.
These simple plane  geometry notions were  sufficiently removed  from
set-theoretic ones that those preexisting specific concepts would be
totally  irrelevant; on the other  hand, the general  concepts---the
ones with the heuristics attached---would still be just as relevant:
Any-concept, Operation, Predicate, Structure, etc.

For each  new geometric  concept, the  only facet  filled in  was its
Definition.  For the new predicates and operators, their Domain/range
entries were also supplied.
No new heuristics were added to {\AM}.

{\bf Result:} Fairly good behavior.   {\AM} was able to find examples  of all
the  concepts defined, and  to use  the character  of the  results of
those examples searches to  determine intelligent courses of  action.
{\AM} derived congruence and similarity of  triangles, and several other
well-known  simple  concepts.  An  unusual  result  was the  repeated
derivation of the idea  of ``timberline.'' This  is a predicate on  two
triangles: Timberline(T1,T2) iff  T1 and T2 have a  common angle, and
the side opposite that angle in the two triangles are parallel:

\vfill
\ctrline{(Timberline Diagram)}
\eject

%\vskip 3in % Insert the timberline diagram here.

Since {\AM} kept rederiving  this in new ways, it  seems surprising that
there is no very  common name for the concept. It could be that {\AM} is
using techniques which humans don't---at least, for geometry.

The only new bit of knowledge that came out of this experiment was  a
``use'' for  Goldbach's conjecture:  any angle  (0--180$\deg$) can  be
built  up (to  within 1$\deg$) as the  sum of  two angles  of prime
degrees (<180$\deg$). This result is admittedly esoteric at best,  but is
nonetheless worth reporting.

The total effort expended on this  experiment was: a few months of
subconscious processing,  ten hours of designing the base of concepts
to insert,  ten hours inserting  and debugging  them. The whole  task
took about two days of real time.

The conclusion to be drawn is that heuristics really can be generally
useful; their  attachment  to  general-sounding concepts  is  not  an
\ffootnote{illusion.}{Or  it's a very good illusion! But note: If this phenomenon is repeatable
and useful, then (like Newtonian mechanics) it won't pragmatically 
matter whether it's only
an approximation to reality.}
The  implication  of  this  is  that  {\AM}  can  be  grown
incrementally,  domain by domain.   Adding expertise in  a new domain
requires only the introduction of concepts local to that  domain; all
the very  general concepts---and their heuristics---already exist
and can be used with no change.


The  author feels  that  this result  can be  generalized: {\AM}  can be
expanded in scope,  even to nonmathematical  fields of endeavor.  In
each  field, however, the  rankings of  the various  \footnote{heuristics}{The
numeric values  that 
should be returned by
the  local ratings  formulae  which are attached  to  the
heuristic rules. }  may shift  slightly. As the  domain
gets further  away from mathematics, various heuristics are important
which were ignorable  before (\eg, those  dealing with ethics),  and
some pure  math research-oriented  heuristics become  less applicable
(``giving  up and  moving on  to another  topic'' is not  an acceptable
response to the 15-puzzle, nor to a hostage situation).

Well, it sounds as if we've shifted our orientation from ``Results'' to
a subjective  evaluation of those results. Let's  start a new chapter
to legitimize this type of commentary.

\worldend